首页> 外文OA文献 >MaxSSmap: A GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence
【2h】

MaxSSmap: A GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence

机译:maxssmap:用于将不同的短读取映射到基因组的GpU程序   具有最大得分子序列

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Programs based on hash tables and Burrows-Wheeler are very fast for mappingshort reads to genomes but have low accuracy in the presence of mismatches andgaps. Such reads can be aligned accurately with the Smith-Waterman algorithmbut it can take hours and days to map millions of reads even for bacteriagenomes. We introduce a GPU program called MaxSSmap with the aim of achievingcomparable accuracy to Smith-Waterman but with faster runtimes. Similar to mostprograms MaxSSmap identifies a local region of the genome followed by exactalignment. Instead of using hash tables or Burrows-Wheeler in the first part,MaxSSmap calculates maximum scoring subsequence score between the read anddisjoint fragments of the genome in parallel on a GPU and selects the highestscoring fragment for exact alignment. We evaluate MaxSSmap's accuracy andruntime when mapping simulated Illumina E.coli and human chromosome one readsof different lengths and 10\% to 30\% mismatches with gaps to the E.coli genomeand human chromosome one. We also demonstrate applications on real data bymapping ancient horse DNA reads to modern genomes and unmapped paired readsfrom NA12878 in 1000 genomes. We show that MaxSSmap attains comparable highaccuracy and low error to fast Smith-Waterman programs yet has much lowerruntimes. We show that MaxSSmap can map reads rejected by BWA and NextGenMapwith high accuracy and low error much faster than if Smith-Waterman were used.On short read lengths of 36 and 51 both MaxSSmap and Smith-Waterman have loweraccuracy compared to at higher lengths. On real data MaxSSmap produces manyalignments with high score and mapping quality that are not given by NextGenMapand BWA. The MaxSSmap source code is freely available fromhttp://www.cs.njit.edu/usman/MaxSSmap.
机译:基于哈希表和Burrows-Wheeler的程序将短读序列映射到基因组的速度非常快,但是在存在不匹配和缺口的情况下准确性较低。可以使用Smith-Waterman算法将这些读段准确对齐,但是即使细菌基因组分析也需要数小时和数天才能绘制数百万条读图。我们引入了一个名为MaxSSmap的GPU程序,旨在达到与Smith-Waterman相当的精度,但运行速度更快。与大多数程序相似,MaxSSmap可识别基因组的局部区域,然后进行精确比对。 MaxSSmap不在第一部分中使用哈希表或Burrows-Wheeler,而是在GPU上并行计算基因组的读取片段和不相交片段之间的最大得分子序列得分,并选择最高得分的片段进行精确比对。我们在绘制模拟的Illumina大肠杆菌和人类染色体1的读段时,评估了MaxSSmap的准确性和运行时间,该读数的长度不同,并且与大肠杆菌基因组和人类染色体1的缺口有10%至30%的不匹配。我们还通过将古马DNA读图映射到现代基因组,以及将NA12878的未映射成对的读图映射到1000个基因组中,来证明在真实数据上的应用。我们证明,MaxSSmap具有与快速Smith-Waterman程序相当的高精度和低错误,但运行时间却低得多。与使用Smith-Waterman相比,我们展示了MaxSSmap能够以更高的准确性和低错误来映射被BWA和NextGenMap拒绝的读取,并且在短读长度为36和51时,MaxSSmap和Smith-Waterman的准确性都较低。在真实数据上,MaxSSmap产生许多高分和地图质量的对齐方式,而NextGenMap和BWA则没有。可从http://www.cs.njit.edu/usman/MaxSSmap免费获得MaxSSmap源代码。

著录项

  • 作者

    Turki, Turki; Roshan, Usman;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号